A Framework for Diagnostic Evaluation of MT Based on Linguistic Checkpoints
نویسندگان
چکیده
This paper describes an approach to the diagnostic evaluation of machine translation (MT) based on linguistic checkpoints, which can provide valuable information both to the developers and to the end-users of MT systems. We present a flexible framework and a new tool, DELiC4MT, for fine-grained diagnostic MT evaluation which can be extended to any language pair and applied to any evaluation target, once the phenomena of interest are covered by the linguistic analysis. As a case study, we evaluate the CoSyne MT software against four leading web-based MT systems across a set of linguistic phenomena for three language pairs (from German, Italian and Dutch into English).
منابع مشابه
Woodpecker: An Automatic Methodology for Machine Translation Diagnosis with Rich Linguistic Knowledge
Different from the “black-box” evaluation, the diagnostic evaluation aims to provide a better explanatory power into various aspects of the performance of artificial intelligence systems. However, for machine translation (MT) systems, due to its complexity and knowledge dependency, such diagnostic evaluation often demands a large amount of manual work. To tackle this problem, we propose an auto...
متن کاملRelating Translation Quality Barriers to Source-Text Properties
This paper aims to automatically identify which linguistic phenomena represent barriers to better MT quality. We focus on the translation of news data for two bidirectional language pairs: EN↔ES and EN↔DE. Using the diagnostic MT evaluation toolkit DELiC4MT and a set of human reference translations, we relate translation quality barriers to a selection of 9 source-side PoS-based linguistic chec...
متن کاملDiagnostic Evaluation of Machine Translation Systems Using Automatically Constructed Linguistic Check-Points
We present a diagnostic evaluation platform which provides multi-factored evaluation based on automatically constructed check-points. A check-point is a linguistically motivated unit (e.g. an ambiguous word, a noun phrase, a verb~obj collocation, a prepositional phrase etc.), which are pre-defined in a linguistic taxonomy. We present a method that automatically extracts check-points from parall...
متن کاملA Diagnostic Evaluation Approach Targeting MT Systems for Indian Languages
This paper addresses diagnostic evaluation of machine translation (MT) systems for Indian languages, English to Hindi translation in particular. Evaluation of MT output is an important but difficult task. The difficulty arises primarily from some inherent characteristics of the language pairs, which range from simple word-level discrepancies to more difficult structural variations for Hindi fro...
متن کاملA Web Application for the Diagnostic Evaluation of Machine Translation over Specific Linguistic Phenomena
This paper presents a web application and a web service for the diagnostic evaluation of Machine Translation (MT). These web-based tools are built on top of DELiC4MT, an opensource software package that assesses the performance of MT systems over user-defined linguistic phenomena (lexical, morphological, syntactic and semantic). The advantage of the web-based scenario is clear; compared to the ...
متن کامل